The qgam R package offers methods for fitting additive quantile regression models based on splines, using the methods described in Fasiolo et al., 2017. It is useful to use qgam together with mgcViz, which extends the basic visualizations provided by mgcv.

The main fitting functions are:

Quantile modelling: the motorcycle data set

Let us consider the motorcycle data set:

library(MASS)
data(mcycle)
plot(mcycle)

A good Gaussian GAMLSS model for this data is

library(mgcViz)

fitG <- gamV(list(accel ~ s(times, k=20, bs="ad"), 
                        ~ s(times)),
             data=mcycle, 
             family=gaulss())

print(plot(fitG), pages = 1)

We can obtain the quantiles by inverting the estimated Gaussian CDF

qu <- c(0.1, 0.25, 0.5, 0.75, 0.9)

q_hat <- lapply(qu, 
                function(.q)
                  qnorm(.q, mean = fitG$fitted.values[ , 1], 
                            sd = 1 / fitG$fitted.values[ , 2]))

and we can then plot them on top of the data

plot(mcycle, ylim = c(-150, 75))
for(ii in 1:5) lines(mcycle$times, q_hat[[ii]], col = 2)

Now we fit a quantile GAM for quantile 0.9

fit09 <- qgam(accel ~ s(times, k=20, bs="ad"), 
              data = mcycle, 
              qu = 0.9)
## Estimating learning rate. Each dot corresponds to a loss evaluation. 
## qu = 0.9....................done

We now plot the quantile with confidence intervals

pred <- predict(fit09, se = TRUE)

plot(mcycle, ylim = c(-140, 85))
lines(mcycle$times, pred$fit)
lines(mcycle$times, pred$fit + 2 * pred$se, col = 2)
lines(mcycle$times, pred$fit - 2 * pred$se, col = 2)

There are two problems with this fit:

We address these issues by modelling the learning rate as well

fit09_2 <- qgam(list(accel ~ s(times, k=20, bs="ad"),
                           ~ s(times)),
                 data = mcycle, 
                 qu = 0.9)
## Estimating learning rate. Each dot corresponds to a loss evaluation. 
## qu = 0.9..........done

Again predict and plot

pred <- predict(fit09_2, se = TRUE)

plot(mcycle, ylim = c(-140, 90))
lines(mcycle$times, pred$fit)
lines(mcycle$times, pred$fit + 2 * pred$se, col = 2)
lines(mcycle$times, pred$fit - 2 * pred$se, col = 2)

Looks much better! Recall that

class(fit09_2)
## [1] "qgam" "gam"  "glm"  "lm"

hence we can do, for instance

summary(fit09_2)
## 
## Family: elf 
## Link function: identity 
## 
## Formula:
## accel ~ s(times, k = 20, bs = "ad")
## 
## Parametric coefficients:
##             Estimate Std. Error z value Pr(>|z|)
## (Intercept)    2.182      1.755   1.244    0.214
## 
## Approximate significance of smooth terms:
##            edf Ref.df Chi.sq p-value    
## s(times) 11.05  12.66    393  <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.676   Deviance explained = 97.3%
## -REML = 578.21  Scale est. = 1         n = 133

We can use mgcViz plots by converting as usual

fit09_2 <- getViz(fit09_2)

or using shortcut

fit09_2 <- qgamV(list(accel ~ s(times, k=20, bs="ad"),
                            ~ s(times)),
                 data = mcycle, 
                 qu = 0.9) 

We fit multiple quantiles using the mqgam function

fitMQ <- mqgam(list(accel ~ s(times, k=20, bs="ad"),
                            ~ s(times)),
               data = mcycle, 
               qu = c(0.1, 0.25, 0.5, 0.75, 0.9)) 
## Estimating learning rate. Each dot corresponds to a loss evaluation. 
## qu = 0.5..............done 
## qu = 0.25..........................done 
## qu = 0.75.........done 
## qu = 0.1..............done 
## qu = 0.9..........done

This should be manipulated using the qdo function

qdo(fitMQ, 0.1, summary)
## 
## Family: elf 
## Link function: identity 
## 
## Formula:
## accel ~ s(times, k = 20, bs = "ad")
## 
## Parametric coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  -54.535      2.367  -23.04   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##            edf Ref.df Chi.sq p-value    
## s(times) 8.323  9.153  932.5  <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.649   Deviance explained = 89.9%
## -REML = 593.53  Scale est. = 1         n = 133

To plot one of the fitted quantile models we do

qdo(fitMQ, 0.1, plot)

To plot effect of several quantile at once we use mgcViz

fitMQ_viz <- getViz(fitMQ)

plot(fitMQ_viz)

We can plot the fitted quantile on the data

plot(mcycle, ylim = c(-150, 90))
for(ii in 1:5) lines(mcycle$times, predict(fitMQ_viz[[ii]]), col = 2)

NB output of getViz is simply a list of gam models, we don’t need to use qdo. But there is a memory price to pay (small for this small data set)

object.size(fitMQ) / 1e6
## 1 bytes
object.size(fitMQ_viz) / 1e6
## 1.2 bytes

To recap we can do

Rainfall modelling in Parana state of Brasil

Here we are going to model weekly rainfall (mm/week) from over 600 weather station in the Parana state of Brazil. The data comes comes from the INLA R package. The meaning of most variables should be evident.

library(testGam)
data("parana")

plot(parana[parana$TIME==1, ]$LO, parana[parana$TIME==1, ]$LA, xlab = "LO", ylab = "LA") 

We fit a quantile GAM for \(\tau = 0.8\) with an isotropic effect for longitude and latitude, a cyclic effect for the time of the year and smooth effects for distance from the sea and elevation:

library(mgcViz)
fit <- qgamV(PREC ~ s(LO, LA, k = 25) + s(seaDist) + s(TIME, bs = "cc") + s(EL),
             data = parana,
             qu = 0.8)
## Estimating learning rate. Each dot corresponds to a loss evaluation. 
## qu = 0.8............done
print(plot(fit), pages = 1)

Hence we can do model checking in mgcViz. However standard tools are inadequate

qq(fit, CI = "normal") 

But we can verify number of observations falling below the fit

mean( (parana$PREC - fit$fitted.values) < 0 )
## [1] 0.803384

and mgcViz provides a specific layer for doing this along one covariate

check1D(fit, x = "EL") + l_gridQCheck1D() 

or two covariates

check2D(fit, x1 = "LO", x2 = "LA") + l_gridQCheck2D() 

We can of course fit this model to several quantiles:

fitM <- mqgamV(PREC ~ s(LO, LA, k = 25) + s(seaDist) + s(TIME, bs = "cc") + s(EL),
               data = parana,
               qu = seq(0.1, 0.9, length.out = 5))
## Estimating learning rate. Each dot corresponds to a loss evaluation. 
## qu = 0.5................done 
## qu = 0.3...................done 
## qu = 0.7...............done 
## qu = 0.1......................done 
## qu = 0.9............done
plot(fitM, select = 1)

print(plot(fitM, select = 2:4), pages = 1)

Notice that the spatial effect seems much stronger for high quantiles than for the low ones. The same is true for distance from the sea and seasonality (TIME), while the effect of elevation is not significant from qu = 0.9. We can visualize the spatial effect in 3D as follows:

plotRGL(sm(fitM[[5]], 1))

We might also wonder whether the spatial effect changes with time. To verify this here we construct a tensor product smooth between the 2D thin-plate-spline spatial effect and the cyclical effect of time. We simplify the model by removing the effect of seaDist.

fit4 <- qgamV(PREC ~ te(LO, LA, TIME, d = c(2, 1), k = c(20, 10), bs = c("tp", "cc")) + s(EL),
              data = parana,
              qu = 0.9)
## Estimating learning rate. Each dot corresponds to a loss evaluation. 
## qu = 0.9........done
plotSlice(sm(fit4, 1),
          fix = list("TIME" = round(seq(1, 53, length.out = 6)))) + l_fitRaster()

You can plot any slice in 3D by doing:

plotRGL(sm(fit4, 1), fix = c("TIME" = 11))